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Objectives: In 1995, an outbreak survey in Gozan-dong concluded that an association between fiberglass exposure in 
drinking water and cancer outbreak cannot be established. This study follows the subjects from a study in 1995 using a 
data linkage method to examine whether an association existed. The authors will address the potential benefits and 
methodological issues following outbreak surveys using data linkage, particularly when informed consent is absent. 
Methods: This is a follow-up study of 697 (30 exposed) individuals out of the original 888 (31 exposed) participants 
(78.5%) from 1995 to 2007 assessing the cancer outcomes and deaths of these individuals. The National Cancer Registry 
(KNCR) and death certificate data were linked using the ID numbers of the participants. The standardized incidence ratio 
(SIR) and standardized mortality ratio (SMR) from cancers were calculated by the KNCR. 

Results: The SIR values for all cancer or gastrointestinal cancer (Gl) occurrences were the lowest in the exposed group 
(SIR, 0.73; 95% CI, 0.10 to 5.21; 0.00 for Gl), while the two control groups (control 1: external, control 2: internal) showed 
slight increases in their SIR values (SIR, 1.18 and 1.27 for all cancers; 1.62 and 1.46 for Gl). All lacked statistical 
significance. All-cause mortality levels for the three groups showed the same pattern (SMR 0.37, 1.29, and 1.11). 
Conclusions: This study did not refute a finding of non-association with a 13-year follow-up. Considering that many 
outbreak surveys are associated with a small sample size and a cross-sectional design, follow-up studies that utilize data 
linkage should become standard procedure. 
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INTRODUCTION 

An outbreak survey at the Gozan-dong, Inchon area 
was initiated by a suit of health hazard filed by the local 
residents in 1994. Glass fibers, dispersed from a glass 
fiber insulation material factory were the alleged cause 
of health hazards including stomach cancer. An 
association between stomach cancer and glass fiber 
intake through contaminated water was reported [1], 
which had escalated the case into a public health issue. A 
large scale survey was launched by the support of the 
Incheon city government ("the survey") [2,3]. Unlike 
asbestos, health effects of glass fiber had been largely 
unknown, and the survey team developed methods of 
exposure assessment with international collaboration 
[4,5]. In the survey, main exposure of interest was the 
extent of glass fiber intake through contaminated water 
[2]. Through analyses on the drinking water sources, 31 
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individuals were classified as exposed. The survey team 
in 1995 consisted of multidisciplinary expertise [6-9], 
which became a role model for similar investigation 
[10,11]. The survey, however, had too few exposed 
subjects, so that the negative results could be either 
interpreted as small effect size or as lack of power. The 
issue of underpower study was not limited to this survey, 
but a general problem of local outbreak surveys because 
of the limited sample size and the cross-sectional designs 
[6,10,12]. It is often infeasible to confirm or refute 
alleged associations by the initial survey alone [13,14]. 

Previous studies, including the survey in 1995, were 
conducted without getting written informed consent. 
However, if personal information is available for 
research purposes, current policy of public institutes 
allows researchers to generate group statistics after 
deleting individual level information. In Korea, 
numerous epidemiologic studies have established 
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associations between risk factors and diseases by using 
open data sources or data linkage methods. However, in 
the field of environmental epidemiology, these efforts 
have been very limited. We are determined to follow the 
1995 participants by rehabilitating the personal 
information of baseline survey. There were some 
methodological issues as well as ethicolegal 
considerations when follow-up studies were performed 
using data without individual consent. The authors will 
present the findings from the follow-up study, as well as 
involved methodological issues so that our experience 
can be used in similar studies. 

METHODS 

I. Participants 

The same criteria for classifying exposure status were 
used as the initial survey, assuming that additional 
exposure was absent, because the glass fiber factory was 
shut down in 1995 [23]- Among the 31 exposed and 858 
control groups, those with valid personal ID number 
were involved in this study. The definition of exposure 
and control group are as follows: those using 
contaminated water by glass fiber were defined as 
exposed (n=31); external control group (=control 1) 
included those who lived nearby, but were not a party to 
the glass fiber suit (n=642); internal control group 
(control 2) were the same residents active in the case, but 
their drinking water were clean from glass fiber 
contamination (n=215). We assumed that comparisons 
between exposure and external control group will reflect 
both the effects of glass fiber and possible interests for 
compensation; the comparisons between exposure and 
internal control group will capture the health effects of 
glass fiber only. 

II. Methods of Follow-Up 

After acquiring the approval from the institutional 
review board, we sent the personal ID of all available 
data to the National Cancer Registry of National Cancer 
Center Korea, and the Korean Statistical Office, where 
follow-up for cancers and mortality was performed. The 
follow-up period started from 1995 until the end of 2007 
(13 years). The cancer and mortality occurrences during 
the period were provided as indirectly standardized rates 
by the exposure status, after deleting personal 
information. We selected disease codes for cancers as 



C00-C97 and gastrointestinal tract cancers C15-C20 by 
International Classification of Disease and Related 
Health Problems 10 th revision (ICD-10). 

III. Standardized Incidence Ratio and 
Standardized Incidence Mortality Ratio 
Calculation 

During the 13 year follow-up period, we reconstructed 
cumulative observation time as person*year, following 
the increase of age (5 year window). For example, if 
there was a 63- year old man at the survey, he 
contributed two years (=two person*year) to the 60-64 
group between 1995 and 1996; his contribution to this 
60-64 group is confined to two years and he had 
contributed 5 years to 65-69 group between the 1997 
and 2001, and to the next age group on. By this manner, 
the cumulative person*time for calculating standardized 
rates was calculated considering age and sex. We have 
conducted indirect standardization of the incidence and 
the mortality rates based on the rates of 2001 mortality in 
Koreans, and 2001 end-of-year census population 
structure. 

Standardized incidence ratio (SIR) and standardized 
mortality ratio (SMR) were calculated by dividing the 
actual observed cases and deaths by expected number of 
deaths estimated from the population rates. 95% 
confidence intervals (CIs) of the standardized rates were 
calculated with the method by Breslow and Day 
(Formula 1) [15] 

Formula 1: 95% CIs of standardization rate 
SRR*{\±ial2 *4(OjE r )} 

SRR: standardized rate ratio, O: total number of observed deaths, 
E: total number of expected value by indirect standardization 

Considering the difference in the follow-up loss rate 
between the exposed and control group, we estimated 
corrected SIR for controls, assuming that the no more 
cancer cases were occurred with complete follow-up. 
The corrected SIR, thus, is the least possible SIR level, 
and if the exposed group's SIR is lower than the 
corrected SIR, we can exclude the possibility that SIR is 
"higher" among the exposed. 
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Table 1. General characteristics of the original subjects, and of those available for the follow-up study by the 
exposure status (external control=control 1 , internal control=control 2) 



Classification 


No. of subjects completed surveys 
in 1995 


No. of subjects followed 
(follow-up rate %) 


No. subjects with insufficient ID 
information 




Exposed 




A: 35.0 




A: 35.4 




A: 33.0 




31 


M: 38.7 


30 (96.7) 


M: 40.0 


1 


M:0.0 






S: 29.0 


S: 26.6 


S: 0.0 






D: 0.0 




D: 0.0 




D: 0.0 


Control 1 




A: 32.8' 




A: 33.7 1 




A: 26.2' 




642 


M: 46.8 
S: 34.3 
D: 24.4 


489 (76.2) 


M: 46.6 
S: 35.1 
D: 26.2 


153 


M: 48.4 
S: 31.2 
D: 19.0 


Control 2 




A: 39.1 1 




A: 39.2 1 




A: 38.7 




215 


M: 46.0 
S: 34.4 
D: 22.7 


178(82.7) 


M: 49.7 
S: 36.7 
D: 24.0 


37 


M:21.7' 
S: 15.7 1 
D:11.r 


Total 




A: 37.4 




A: 37.6 




A: 36.8 




888 


M: 46.5 
S: 34.2 


697 (78.5) 


M: 46.9 
S: 35.1 


191 


M: 44.5 
S: 29.3 






D: 24.2 




D: 26.2 




D: 18.6 



A: mean age (years), M: men %, S: smoker %, D: drinker %. 

Bold: significant difference between followed and not-followed subjects in each exposure group (p<0.05, by chi-square test, d.f.=1). 
' significantly different distribution compared with the similar stratum of other exposure status (p<0.05, by chi-square test d.f.=2). 

Table 2. Standardized cancer incidence ratios for all cancers and gastrointestinal cancers during the follow-up 
period of 1995-2007 by the exposure status 



Group (n) 


Observed case 


Person-year of 
observation 


Expected No. of 
cases 


SIR 
(95% CI) 


SIR adjusted for f/u 
rate 4 




Exposed (30) 


All cancer 2 1 


374.8 


1.36 


0.73 (0.10-5.21) 


0.70 




Gl cancer 3 0 


380.9 


0.46 


NA 




Control 1 1 (489) 


All cancer 29 


5932.5 


24.5 


1.18(0.82-1.71) 


0.92 




Gl cancer 14 


5984.1 


8.62 


1.62(0.96 - 2.74) 




Control 2 1 (178) 


All cancer 8 


2191.3 


6.29 


1.27(0.64-2.55) 


1.07 




Gl cancer 3 


2192.9 


2.06 


1.46 (0.47-4.52) 




All subjects (697) 


All cancer 38 


8498.6 


32.2 


1.18 (0.86- 1.62) 


NA 




Gl cancer 17 


8558.0 


11.2 


1.53 (0.95-2.45) 





SIR: standardized incidence ratio of cancer occurrence, CI: confidence interval, f/u: follow up, NA: not available. 

1 control 1 : external control, control 2: internal control, 2 all cancer cases including ICD-10 code C00-C97, 3 all gastrointestinal cancer cases including 
ICD-1 0 code C15-C20, 4 adjusted SIR: SIR calculated assuming all the subjects were followed-up and no more cases were found (minimal SIR). 



RESULTS 

Among the original 888 participants, personal IDs of 
the 697 individuals (78.5%) were valid. The number of 
followed subjects according to their exposure status was 
presented in Table 1. While all subjects except one 
person were included in the exposure group, the external 
and internal control groups showed follow-up rates of 
76.2% and 82.7%. When we compared the sex ratio, 
smoking and drinking prevalence, those followed in the 
internal control group (control 2) had more male, more 
smokers and heavy drinkers than those lost in the same 
group, but those in exposed and external control group 
did not show significant differences between the 
followed and lost. 

SIRs of all types and digestive system cancer were 
estimated as follows; 0.73 (95% CI, 0.10 to 5.21) and 
0.00 (no cases) for the exposed group; 1.18 (95% CI, 



0.82 to 1.71) and 1.62 (95% CI, 0.96 to 2.74) for the 
external control group; 1.27 (95% CI, 0.64 to 2.55) and 
1 .46 (95% CI, 0.47 to 4.52) for the internal control group 
(Table 2). The corrected SIRs assuming complete 
follow-up were 0.92 in the external control group. 

The SMRs from all cause death were 0.37 (95% CI, 
0.05 to 2.67), 1 .29 (95% CI, 0.78 to 2.15) and 1 .11 (95% 
CI, 0.84 to 1.45) for the exposed, external and internal 
control groups. SMR in the exposed was lower than two 
control groups without statistical significance. (Table 3) 



DISCUSSION 

The findings from 13 year follow-up did not refute the 
non-association. When we compare the results of this 
follow-up study and those in 1995, the original trend of 
higher SIR and SMR in the exposed were reversed, 
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Table 3. Standardized mortality ratios from all cause of death during the follow-up period of 1995-2007 by the 
exposure status (external control = control 1, internal control = control 2) 

Observed mortality Person-year of Expected „. . D .„ 0/ 

from any cause observation No. of cases 

Exposed 1 380.9 2.65 0.37(0.05-2.67) 

Control 1 52 6060.4 1 1 .58 1 .29 (0.78 - 2. 1 5) 

Control2 15 2213.2 47.01 1.11 (0.84 - 1.45) 

All subjects 68 8654.6 61.25 1.11 (0.87 - 1.40) 

SMR: standardized mortality ratio from all cause death, CI: confidence interval. 

Cause-specific mortality rate was not provided due to privacy protection regulation. 



although all the results were not statistically significant. 

There are two explanations for the lack of 
significance; insufficient power and true non- 
associations. If we assume 50 year follow-up for 30 and 
600 individuals in the exposed and controls, and 30% 
life time risk of all cancers in control group, the effect 
size should exceed 2.0 to be significant (with 80% 
power and 5% alpha error level). Another explanation is 
the lack of association or very small effect size. If the 
representative estimators of SIRs and SMRs are higher 
in the exposed group, we cannot conclude whether the 
findings are derived from insufficient power or smaller 
than detectable effect size (or lacking associations). The 
lower SIRs and SMR in the exposed group, however, 
disapprove of "increased risk" in the exposure group 
although it cannot prove that "smaller risk". 

For the reversed cancer SIRs between the exposed and 
controls compared with those ratios in 1995, several 
explanations are possible. First, the 1995 survey, as a 
cross-sectional study, could have been influenced by 
information errors. Over- or under-report between two 
groups were possible because the SIR of cancers solely 
depended on the questionnaire in 1995. Second, chance 
event may explain some part of the findings. Another 
possibility is the "harvesting effect". If an excess of 
mortality actually existed in a small group of population, 
the excess deaths will lead to reduced mortality in near 
future. However, it is unlikely that both of the lower SIR 
and SMR can be explained by the harvesting effect 
alone, considering the relatively long follow-up period. 
The harvesting effects are well documented in time- 
series analyses of air pollution health studies. The 
methodologies to adjust and detect the harvesting effects 
in small group follow-up study remain as future tasks. 

Some Methodological issues are worth addressing. 
First, methods of handling the difference in the follow- 
up rate by the exposure status should be considered. In 
this study we had lower follow-up rate in the controls 
and we estimated corrected SIRs in order to calculate the 
minimum SIR level. The method of calculating 
corrected SIRs can be different depending on the 



situation. If the exposed have higher SIRs, the corrected 
SIR in the control should assume "maximum" SIR by 
applying the highest level of incidence rate (= upper CI 
of the SIR) to the lost, and that in the exposed be 
"minimal". In this study by showing that the SIRs in the 
exposed were lower than the minimum corrected SIR of 
the control, we could conclude that the SIRs in the 
exposed is "not higher" than those in control group. This 
sensitivity analysis can be applied to draw logical 
inferences out of the incomplete data. 

Second, an ethicolegal issue for the follow-up study; 
is it possible to follow the previously acquired study 
participants who do not have submitted explicit 
informed consent for data linkage? The concept of 
written informed consent was not settled in 1995. The 
authors did provide written information to the 
participants and the participation itself was regarded as 
consent at that time. It is, however, not sufficient under 
current ethical standard. Current data management 
policy of The National Cancer Registry (KNCR) 
reconciles the insufficient consent and research needs by 
providing aggregate data after deleting any personal 
information when the purpose of research is approved. 
The current policy of protecting privacy, however, will 
limit some types of analysis, particularly the cause- 
specific mortality analysis. In addition, information on 
the rare type of cancer occurrences will not be provided 
by the same token. It will be an issue of research ethics 
whether the cause-specific mortality or rare cancer can 
be provided as a statistical data or not. 

Vital statistics data and cancer registry data in Korea 
are well-documented for their validity and reliability 
[16-18]. Until the late 1990s, about 20% of death records 
in the vital statistics have lacked the diagnosis of the 
medical doctors [16]. Deaths from cancer, however, 
have higher rate of medical certificate [17], and it is 
unlikely that the SMRs from cancer will be affected by 
the quality of vital statistics before late 1990s. The 
KNCR data since the mid 1990s have met the 
international standard in terms of completeness and 
validity. [18] 
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In the 1995 survey, the authors indicated the possible 
increase in benign soft tissue tumors in the exposed. This 
finding, however, could not be examined by the current 
data linkage method. When the health effects of interest 
are subclinical pathologic changes, systematic biobank 
will be helpful. Regardless of the initial decision, further 
knowledge will allow new approaches to examine the 
associations using collected biospecimens. In the survey 
of 1995, although efforts to collect some biospecimens 
had been attempted, the limitations of quality and 
quantity did not allow new analyses. The authors suggest 
the long-term follow-up become a standard practice of 
any outbreak epidemics. Additionally, we suggest that 
the construction of systematic biobank, as well as getting 
informed consents for data linkage, be included in the 
standard practice of outbreak survey to improve the 
current ability to detect or refute associations. 
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